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Abstract 

We present the formal verification of synchronizing 
aspects of the Reliable Computing Platform (RCP), 
a fault-tolerant computing system for digital flight 
control applications. The RCP uses NMR-style re- 
dundancy to mask faults and internal majority vot- 
ing to purge the effects of transient faults. The sys- 
tem design has been formally specified and verified 
using the Ehdm verification system. Our formaliza- 
tion is based on an extended state machine model 
incorporating snapshots of local processors’ clocks. 

Key Words - Clock synchronization, correctness 
proofs , fault tolerance, formal methods, majority vot- 
ing, modular redundancy, theorem proving, transient 
fault recovery . 

1 Introduction 

NASA is engaged in a major research effort towards 
the development of a practical validation and veri- 
fication methodology for digital fly-by-wire control 
systems. Researchers at NASA Langley Research 
Center (LaRC) are exploring formal verification as 
a candidate technology for the elimination of de- 
sign errors in such systems. In previous reports 
[Di Vito 1990, Di Vito 1992, Butler 1991], we put 
forward a high level architecture for a reliable com- 
puting platform (RCP) based on fault-tolerant com- 
puting principles. Central to this work is the use of 
formal methods for the verification of a fault-tolerant 
operating system that schedules and executes the 
application tasks of a digital flight control system. 
Phase 1 of this effort established results about the 
high level design of RCP. This paper discusses our 
Phase 2 results, which carry the design, specification, 
and verification of RCP to lower levels of abstraction. 
Complete details of the Phase 2 work are available 
in technical report form [Butler 1992]. 

1 Third IFIP International Working Conference on Depend- 
able Computing for Critical Applications. Mondello, Sicily, 
Italy. September 14-16, 1992. 


The major goal of this work is to produce a ver- 
ified real-time computing platform, both hardware 
and operating sysf em software, useful for a wide vari- 
ety of control-system applications. Toward this goal, 
the operating system provides a user interface that 
“hides” the implementation details of the system 
such as the redundant processors, voting, clock syn- 
chronization, etc. We adopt a very abstract model 
of real-time computation, introduce three levels of 
decomposition of the model towards a physical real- 
ization, and rigorously prove that the decomposition 
correctly implements the model. Specifications and 
proofs have been mechanized using the Ehdm verifi- 
cation system [von Henke 1988]. 

A major objective of the RCP design is to enable 
the system to recover from the effects of transient 
faults. More than their analog predecessors, digital 
flight control systems are vulnerable to external phe- 
nomena that can temporarily affect the system with- 
out permanently damaging the physical hardware. 
External phenomena such as electromagnetic inter- 
ference (EMI) can flip the bits in a processor’s mem- 
ory or temporarily affect an ALU. EMI can come 
from many sources such as cosmic radiation, light- 
ning or High Intensity Radiated Fields (HIRE). 

RCP is designed to automatically purge the effects 
of transients periodically, provided the transient is 
not massive, that is, simultaneously affecting a ma- 
jority of the redundant processors in the system. Of 
course, there is no hope of recovery if the system 
designed to overcome transient faults contains a de- 
sign flaw. Consequently, emphasis has been placed 
on techniques that mathematically show when the 
desired recovery properties are obtained. 

1.1 Design of RCP 

We propose a well-defined operating system that pro- 
vides the applications software developer a reliable 
mechanism for dispatching periodic tasks on a fault- 
tolerant computing base that appears to him as a sin- 
gle ultra-reliable processor. A four-level hierarchical 
decomposition of the reliable computing platform is 
shown in figure 1. 
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Figure 1: Hierarchical specification of RCP. 

The top level of the hierarchy describes the oper- 
ating system as a function that sequentially invokes 
application tasks. This view of the operating system 
will be referred to as the uniprocessor model , which 
forms the top-level requirement for the RCP. 

Fault tolerance is achieved by voting the results 
computed by the replicated processors operating on 
identical inputs. Interactive consistency checks on 
sensor inputs and voting of actuator outputs requires 
synchronization of the replicated processors. The 
second level in the hierarchy describes the operating 
system as a synchronous system where each repli- 
cated processor executes the same application tasks. 
The existence of a global time base, an interactive 
consistency mechanism and a reliable voting mecha- 
nism are assumed at this level. 

Although not anticipated during the Phase 1 ef- 
fort, another layer of refinement was inserted before 
the introduction of asynchrony. Level 3 of the hi- 
erarchy breaks a frame into four sequential phases. 
This allows a more explicit modeling of interproces- 
sor communication and the time phasing of compu- 
tation, communication, and voting. The use of this 
intermediate model avoids introducing these issues 
along with those of real time, thus preventing an 
overload of details in the proof process. 

At the fourth level, the assumptions of the syn- 
chronous model must be discharged. Rushby and von 
Henke [Rushby 1989] report on the formal verifica- 
tion of Lamport and Melliar- Smith’s [Lamport 1985] 
interactive-convergence clock synchronization algo- 
rithm. This algorithm can serve as a foundation for 
the implementation of the replicated system as a col- 
lection of asynchronously operating processors. Ded- 
icated hardware implementations of the clock syn- 
chronization function are a long-term goal. 

Figure 2 depicts the generic hardware architec- 
ture assumed for implementing the replicated sys- 
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Figure 2: Generic hardware architecture. 

tern. Single-source sensor inputs are distributed 
by special purpose hardware executing a Byzantine 
agreement algorithm. Replicated actuator outputs 
are ail delivered in parallel to the actuators, where 
force-sum voting occurs. Interprocessor communica- 
tion links allow replicated processors to exchange and 
vote on the results of task computations. As previ- 
ously suggested, clock synchronization hardware will 
be added to the architecture as well. 

1.2 Previous Efforts 

Many techniques for implementing fault-tolerance 
through redundancy have been developed over 
the past decade, e.g. SIFT [Goldberg 1984], 
FTMP [Hopkins 1978], FTP [Lala 1986], MAFT 
[Walter 1985], and MARS [Kopetz 1989]. An often 
overlooked but significant factor in the development 
process is the approach to system verification. In 
SIFT and MAFT, serious consideration was given to 
the need to mathematically reason about the system. 
In FTMP and FTP, the verification concept was al- 
most exclusively testing. 

Among previous efforts, only the SIFT project 
attempted to use formal methods [Moser 1987]. 
Although the SIFT operating system was never 
completely verified [NASA 1983], the concept 
of Byzantine Generals algorithms was developed 
[Lamport 1982] as was the first fault-tolerant clock 
synchronization algorithm with a mathematical per- 
formance proof [Lamport 1985]. Other theoretical 
investigations have also addressed the problems of 


86 





replicated systems [Mancini 1988]. 

Some recent work has focused on problems 
related to the style of fault-tolerant computing 
adopted by RCP. Rushby has studied a fault 
masking and transient recovery model and created 
a formalization of it using Ehdm [Rushby 1991, 
Rushby 1992]. Rushby ’s model is more general 

than ours, but assumes a tighter degree of synchro- 
nization where voting takes place after every task 
execution. In addition, Shankar has undertaken 
the formalization of a general scheme for model- 
ing fault-tolerant clock synchronization algorithms 
[Shankar 1991, Shankar 1992]. Several efforts in 
hardware verification are likewise relevant. Bevier 
and Young have verified a circuit design for perform- 
ing interactive consistency [Bevier 1991], while Sri- 
vas and Bickford have carried out a similar activity 
[Srivas 1991]. Schubert and Levitt have verified the 
design of processor support circuitry, namely a mem- 
ory management unit [Schubert 1991]. 

2 Modeling Approach 

The specification of the Reliable Computing Plat- 
form (RCP) is based on state machine concepts. A 
system state models the memory contents of all pro- 
cessors as well as auxiliary variables such as the fault 
status of each processor. This latter type of infor- 
mation may not be observable by a running system, 
but provides a way to express precise specifications. 
System behavior is described by specifying an initial 
state and the allowable transitions from one state 
to another. A transition specification must deter- 
mine (or constrain) the allowable destination states 
in terms of the current state and current inputs. The 
intended interpretation is that each component of the 
state models the local state of one processor and its 
associated hardware. 

RCP specifications are given in relational form. 
This enables one to leave unspecified the behavior 
of a faulty component. Consider the example below. 

Rtran : fu nction[State, State bool] = 

(A s,t: nonfaulty(s(i)) D t(i) = f(s(i))) 

In the relation Rtran , if component i of state s is 
nonfaulty, then component i of the next state t is 
constrained to equal For other values of f, 

that is, when s(i) is faulty, the next state value t(t) is 
unspecified. Any behavior of the faulty component 
is acceptable in the specification defined by Rtran • 

It is important to note that the modeling of com- 
ponent hardware faults is for specification purposes 
only and reflects no self- cognizance on the part of 
the running system. We assume a nonre configurable 


architecture that is capable of masking the effects of 
faults, but makes no attempt to detect or diagnose 
those faults. Transient fault recovery is the result of 
an automatic, continuous voting process; no explicit 
invocation is involved. 

2.1 RCP State Machines 

The RCP specification consists of four separate mod- 
els of the system: Uniprocessor System (US), Repli- 
cated Synchronous (RS), Distributed Synchronous 
(DS), Distributed Asynchronous (DA). Each of these 
specifications is in some sense complete; however, 
they are written at different levels of abstraction and 
describe the behavior of the system with different de- 
grees of detail. 

1. Uniprocessor System layer (US). This con- 
stitutes the top-level specification of the func- 
tional system behavior defined in terms of an 
idealized, fault-free computation mechanism. 
This specification is the correctness criterion to 
be met by all lower level designs. 

2. Replicated Synchronous layer (RS). Proces- 
sors are replicated and the state machine makes 
global transitions as if all processors were per- 
fectly synchronized. Interprocessor communica- 
tion is implicit at this layer. Fault tolerance is 
achieved using exact-match voting on the results 
computed by the replicated processors operating 
on identical inputs. 

3. Distributed Synchronous layer (DS). Next, 
the interprocessor communication mechanism is 
modeled and transitions for the RS layer ma- 
chine are broken into a series of subtransitions. 
Activity on the separate processors is still as- 
sumed to occur synchronously. Interprocessor 
communication is accomplished using a simple 
mailbox scheme. 

4. Distributed Asynchronous layer (DA). Fi- 
nally, the lowest layer relaxes the assumption of 
synchrony and allows each processor to run on 
its own independent clock. Clock time and real 
time are introduced into the modeling formal- 
ism. The DA machine requires an underlying 
clock synchronization mechanism. 

Most of this paper will concentrate on the DA layer 
specification and its proof. 

The basic design strategy is to use a fault-tolerant 
clock synchronization algorithm as the foundation 
for the operating system, providing a global time 
base for the system. Although the synchronization is 
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not perfect, it is possible to develop a reliable com- 
munications scheme where the system clock skew is 
strictly bounded. For all working clocks p and q , the 
synchronization algorithm provides a bounded clock 
skew 6 between p and g, assuming that the number 
of faulty clocks, say m, does not exceed (nrep— 1)/3, 
where nrep is the number of replicated processors. 
This property enables a simple communications pro- 
tocol to be established whereby the receiver waits 
until maxb + 8 after a pre-determined broadcast 
time before reading a message (maxb is the maxi- 
mum communication delay) . 

Each processor in the system executes the same 
set of application tasks during every cycle of a con- 
tinuously repeating task schedule. A schedule com- 
prises a fixed number of frames, each frame.time 
units of time long. A frame is further decomposed 
into four phases: compute, broadcast, vote and sync. 
During the compute phase, all of the applications 
tasks scheduled for this frame are executed. 2 The 
results of all tasks that are to be voted this frame 
are then loaded into the outgoing mailbox, initiating 
a broadcast send operation. During the next phase, 
the broadcast phase, the system merely waits a suffi- 
cient amount of time (maxb + <5) to allow all of the 
messages to be delivered. During the vote phase, 
each processor retrieves all of the replicated data 
from each processor and performs a voting operation. 
Typically, majority voting is used for each of the se- 
lected state elements. The processor then replaces 
its local memory with the voted values. Finally, the 
clock synchronization algorithm is executed during 
the sync phase. Although conceptually this can be 
performed in either software or hardware, we intend 
to use a hardware implementation. 

2.2 Extended State Machine Model 

Formalizing the behavior of the Distributed Asyn- 
chronous layer requires a means of incorporating 
time. We accomplish this by formulating an ex- 
tended state machine model that includes a notion 
of local clock time for each processor. It also recog- 
nizes several types of transitions or operations that 
can be invoked by each processor. The type of oper- 
ation dictates which special constraints are imposed 
on state transitions for certain components. 

The time-extended state machine model allows for 
autonomous local clocks on each processor to be 
modeled using snapshots of clock time coinciding 
with state transitions. Clock values within a state 

2 Multi-rate scheduling is accomplished in RCP by having a 
task execute every n frames, where n may be chosen differently 
for each task. 


represent the time at which the last transition oc- 
curred (time current state was entered). If a state 
was entered by processor p at time T and is occu- 
pied for a duration D, the next transition occurs for 
p at time T + D and this clock value is recorded for 
p in the next state. A function c p (T ) is assumed 
to map local clock values for processor p into real 
time. Notationally, s(i).lc!ock refers to the (logical) 
clock-time snapshot of processor Vs clock in state s. 

Clocks may become skewed in real time. Conse- 
quently, the occurrence of corresponding events on 
different processors may be skewed in real time. A 
state transition for the DA state machine corresponds 
to an aggregate transition in which each processor 
experiences the same event, such as completing one 
phase of a frame and beginning the next. Each 
processor may experience the event at different real 
times and even different clock times if duration val- 
ues are not identical. 

Four classes of operations are distinguished: 

1. L: Purely local processing that involves no 
broadcast communication or mailbox access. 

2. B: Broadcast communication where a send is 
initiated when the state is entered and must be 
completed before the next transition. 

3. R: Local processing that involves no send op- 
erations, but does include reading of mailbox 
values. 

4. C: Clock synchronization operations that may 
cause the local clock to be adjusted and appear 
to be discontinuous. 

We make the simplifying assumption that the du- 
ration spent in each state, except those of type C, 
is nominally a fixed amount of clock time. Al- 
lowances need to be made, however, for small vari- 
ations in the actual clock time used by real proces- 
sors. Thus if v is the maximum rate of variation and 
Dii &a are the intended and actual durations, then 
| Da ~ Dj | < vDi must hold. 

2.3 The Proof Method 

The proof method is a variation of the classical al- 
gebraic technique of showing that a homomorphism 
exists. Such a proof can be visualized as showing 
that a diagram “commutes” (figure 3). Consider 
two adjacent levels of abstraction, called the top and 
bottom levels for convenience. At the top level we 
have a current state, s' , a destination state, V, and 
a transition that relates the two. The properties of 
the transition are given as a mathematical relation, 
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Figure 3: States, transitions, and mappings. 


A ftop(s',t')- Similarly, the bottom level consists of 
states, s and tf, and a transition that relates the two, 
Nbottom(s,t). The state values at the bottom level 
are related to the state values at the top level by 
way of a mapping function, map. To establish that 
the bottom level implements the top level one must 
show that the diagram commutes (in a sense meant 
for relations instead of functions): 

D A fto P (map(s), map{t)) 

where map(s ) = s' and map(t) = t' in the diagram. 
One must also show that initial states map up: 

^bottom ($) D %top{plO>p{sy) 

An additional consideration in constructing such 
proofs is that only states reachable from an initial 
state are relevant. Thus, it suffices to prove a con- 
ditional form of commutativity that assumes transi- 
tions always begin from reachable states. A weaker 
form of the theorem is then called for: 

H(s) AJ^bottom(s,t) D Af top {map{s),map{t)) 

where K is a reachability predicate. This form en- 
ables proofs that proceed by first establishing state 
invariants. Each invariant is shown to hold for all 
reachable states using a specialized induction schema 
and then invoked as a lemma in the main proof. 

By carrying out such proofs for each adjacent pair 
of specification layers in figure 1, we construct a tran- 
sitive argument that the lowest layer correctly im- 
plements the top-most layer. This is equivalent to a 
direct proof from bottom to top using the functional 
composition of all the mappings. Such a large proof 
is difficult to accomplish in practice; hence the use 
of a layered approach. 


2.4 Ehdm Language and Verification 
System 

Design verification in RCP has been carried out 
using Ehdm. The Ehdm verification system 


[von Henke 1988] is a mature tool, which has been 
under development by SRI International since 1983 
and followed their earlier work on HDM. It comprises 
a highly integrated environment for formal system 
development. The specification language is based on 
a higher-order logic with features supporting module 
structure and parameterization. An operational sub- 
set of the language can be automatically translated 
to Ada. 

Ehdm contains an automated theorem prover to 
support proving in the higher-order logic. Decision 
procedures for several arithmetic domains are em- 
bedded in the system. Users invoke the prover by 
writing a proof directive in the specification lan- 
guage, stating explicit premises and any necessary 
substitutions. 


3 Clock Time and Real Time 

In this section we discuss the synchronization theory 
upon which the DA specification depends. Although 
the RCP architecture does not depend on any partic- 
ular clock synchronization algorithm, we have used 
the specification for the interactive consistency algo- 
rithm (ICA) [Lamport 1985] since Ehdm specifica- 
tions for ICA already exist [Rushby 1989]. 

The formal definition of a clock is fundamental. A 
clock can be modeled as a function from real time t 
to clock time T : C(t ) = T or as a function from clock 
time to real time: c(T) — t 3 Since the ICA theory 
was expressed in terms of the latter, we will also be 
modeling clocks as functions from clock time to real 
time. We must be careful to distinguish between an 
uncorrected clock and a clock being resynchronized 
periodically. We use the notation c(T) for an uncor- 
rected clock and rfW(T) to represent a synchronized 
clock during its ith frame. 4 

3.1 Fault Model for Clocks 

In addition to requirements conditioned on having a 
nonfaulty processor, the DA specifications are con- 
cerned with having a nonfaulty clock as well. It is 
assumed that the clock is an independent piece of 
hardware whose faults can be isolated from those 
of the corresponding processor. Although some im- 
plementations of a fault-tolerant architecture such 
as RCP could execute part of the clock synchro- 
nization function in software, thereby making clock 

3 We wiU use the now standard convention of representing 
clock time with capital letters and real time with lower case 
letters. 

4 This differs from the notation, c(*)(T), used in 
[Rushby 1989]. 
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faults and processor faults mutually dependent, we 
assume that RCP implementations will have a dedi- 
cated hardware clock synchronization function. This 
means that a clock can continue to function prop- 
erly during a transient fault period on its adjoining 
processor. The converse is not true, however. Since 
the software executing on a processor depends on 
the clock to properly schedule events, a nonfaulty 
processor having a faulty clock may produce errors. 
Therefore, a one-way fault dependency exists. 

Good clocks have different drift rates with respect 
to perfect time. Nevertheless, this drift rate can be 
bounded. Thus, we define a good clock as one whose 
drift rate is strictly bounded by p/2. A clock is 
“good”, i.e., a predicate good_clock(To,T n ) is true, 
between clock times To and T n iff: 

VTi,T 2 :T 0 <Ti <T n ATo < T 2 < T n 
3 \c P (Ti) - c p (T 2 ) - (Ti - T 2 )| 

< f ♦ITi-Tal 

The synchronization algorithm is executed once 
every frame of duration frame.time. The notation 

T (i) 

is used to represent the start of the iih frame 
at time T° + i * frame_time. The notation T E R! 
means that T falls in the ith frame, that is, 

3 II : 0 < II < frame_time AT = T*° + II 

During the ith frame the synchronized clock on pro- 
cessor p, rt p} is defined by rt p (i,T) = c p (T- bCorr* 0 ), 
where Corr is the cumulative sum of the corrections 
that have been made to the (logical) clock. 

Note that in order for a clock to be nonfaulty in 
the current frame it is necessary that it has been 
working continuously from time zero 5 : 

goodclock(p,T< 0) + Corr*, 0) ,T< i+1 > -f Corr*, 0 ) 

From these definitions we state the condition of hav- 
ing enough good clocks to maintain synchronization: 

enough.clocks: function[period — ► boo!] = 

(Ai : 3* num_good_clocks(i, nrep) > 2* nrep) 

3,2 Clock Synchronization 

Clock synchronization theory provides two impor- 
tant properties about the clock synchronization al- 
gorithm, namely that the skew between good clocks 
is bounded and that the correction to a good clock 
is always bounded. The maximum skew is denoted 
by 8 and the maximum correction is denoted by E. 
More formally, for all nonfaulty clocks p and g, two 
conditions obtain: 

5 This is a limitation not of RCP, but of existing, mechani- 
cally verified fault- tolerant clock synchronization theory. Fu- 
ture work will concentrate on how to make clock synchroniza- 
tion robust in the presence of transient faults. 


SI: VT € R (i) : |r^°(T) - rt ( ,°(T) | < 6 
S2: |Corrp +1 ^ — Corral < E 

The value of 8 is determined by several key param- 
eters of the synchronization system: p, e,<5 0 ,m, nrep. 
The parameter e is a bound on the error in reading 
another processor’s clock. 8 o is an upper bound on 
the initial clock skew and m is the maximum number 
of faulty clocks. 

The main synchronization theorem is: 

sync.thm: Theorem enough_clocks(i) D 
(Vp,g: (VT:T€i* (i) A 

nonfaulty_clock(p, i) A nonfaulty_clock(g, i ) 

D |r#>(T) - rt«(r)| < «)) 

The proof that DA implements DS depends crucially 
upon this theorem. 

3.3 Implementation Restrictions 

Recall that the DA extended state machine model 
recognized four different classes of state transition: 
L, B, R, C. Although each is used for a different phase 
of the frame, the transition types were introduced 
because operation restrictions must be imposed on 
implementations to correctly realize the DA specifi- 
cations. Failure to satisfy these restrictions can ren- 
der an implementation at odds with the underlying 
execution model, where shared data objects are sub- 
ject to the problems of concurrency. The set of con- 
straints on the DA model’s implementation concerns 
possible concurrent accesses to the mailboxes. 

While a broadcast send operation is in progress, 
the receivers’ mailbox values are undefined. If the 
operation is allowed sufficient time to complete, the 
mailbox values will match the original values sent. If 
insufficient time is allowed, or a broadcast operation 
is begun immediately following the current one, the 
final mailbox value cannot be assured. Furthermore, 
we make the additional restriction that all other uses 
of the mailbox be limited to read-only accesses. This 
provides a simple sufficient condition for noninter- 
fering use of the mailboxes, thereby avoiding more 
complex mutual exclusion restrictions. 

Operation Restrictions. Let s and t 
be successive DA states, i be the processor 
with the earliest value of c*(s(z).lclock), and 
j be the processor with the latest value of 
Cj(t(j). Mock). If s corresponds to a broad- 
cast (B) operation, all processors must have 
completed the previous operation of type R 
by time c,(s(i).lclock), and the next opera- 
tion of type B can begin no earlier than time 
Cj(t(j). Mock). No processor may write to 


90 



its mailbox during an operation of type B 
or R. 

By introducing a prescribed discipline on the use 
of mailboxes, we ensure that the axiom describing 
broadcast communication can be legitimately used 
in the DA proof. Although the restrictions are ex- 
pressed in terms of real time inequalities over all 
processors’ clocks, it is possible to derive sufficient 
conditions that satisfy the restrictions and can be 
established from local processor specifications only, 
assuming a clock synchronization mechanism is in 
place. 

4 Design Specifications 

The RCP specifications are expressed in terms of 
some common types and constants, declared in 
Ehdm as follows: 

Pstate: Type (* computation state *) 
inputs: Type O sensor inputs *) 
outputs: Type (* actuator outputs *) 
nrep: nat (* number of processors *) 

Mailboxes and their unit of information exchange 
are provided with types: 

MB : Type (* mailbox entry *) 

MBvec: Type = array [processors] of MB 

This scheme provides one slot in the mailbox array 
for each replicated processor. 

In the following, we present a sketch of the spec- 
ifications for the US and DA layers. To keep the 
presentation brief, we omit the RS and DS specifica- 
tions. Details can be found in [Butler 1992], 

4.1 US Specification 

The US specification is very simple: 

A/* U5 : function[Pstate, Pstate, inputs — ► bool] = 

(\s,t,u:t = fc(u y s)) 

The function J\f us defines the transition relation be- 
tween the current state and the next state. We re- 
quire that the computation performed by the unipro- 
cessor system be deterministic and can be mod- 
eled by a function f c : inputs x Pstate — > Pstate. 
To fit the relational, nondeterministic state machine 
model we simply equate f/ ua (s } t , tt) to the predicate 

t = /c(u, s). 

External system outputs are selected from the val- 
ues computed by f c . The function f a : Pstate — > 
outputs denotes the selection of state variable values 


to be sent to the actuators. The type outputs repre- 
sents a composite of actuator output types. 

While there is no explicit mention of time in the 
US model, it is intended that a transition correspond 
to one frame of the execution schedule. 

The constant initial_proc.state represents the initial 
Pstate value when computation begins. 

initial-us: function[Pstate —*■ bool] = 

( A s : s = initial_proc_state) 

Although the initial state value is unique, initiaLus is 
expressed in predicate form for consistency with the 
overall relational method of specification. 

4.2 DA Specification 

The DA specification permits each processor to run 
asynchronously. Every processor in the system has 
its own clock and task executions on one processor 
take place at different times than on other processors. 
Nevertheless, the model at this level explicitly takes 
advantage of the fact that the clocks of the system 
are synchronized to within a bounded skew <5. 

da_proc-state: Type = 

Record healthy : nat, 

proc_state : Pstate, 
mailbox : MBvec, 

Iclock : logicaLclocktime, 
cum.delta : number 
end record 

da_proc_array: Type = 

array [processors] of da_proc_state 

DAstate: Type = 

Record phase : phases, 
sync.period : nat, 
proc : da.proc_array 
end record 

The phase field of a DAstate indicates whether the 
current phase of the state machine is compute, broad- 
cast, vote, or sync. The sync_period field holds the 
current (unbounded) frame number. 

The state for a single processor is given by a record 
named da_proc_state. The first field of the record is 
healthy, which is 0 when a processor is faulty. Oth- 
erwise, it indicates the (unbounded) number of state 
transitions since the last transient fault. A perma- 
nently faulty processor would have zero in this field 
for all subsequent frames. A processor that is recov- 
ering from a transient fault is indicated by a value 
of healthy less than the constant recovery. period. A 
processor is said to be working whenever healthy > 
recovery. period. The proc.state field of the record is 
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Figure 4: Relationship between c p and da_rt. 

the computation state of the processor. The mail- 
box field of the record denotes the incoming mailbox 
mechanism on each processor. 

The (clock field of a DAstate stores the current 
value of the processor’s local clock. The real- 
time corresponding to this clock time can be found 
through use of the auxiliary function da.rt. 

da_rt: function[DA$tate, processors, 

logicaLclocktime — ► realtime] = 

(A da,p,T : c p (T-j- da.proc(p).cum^delta)) 

This function corresponds to the rt function of the 
clock synchronization theory. Thus, da_rt(s, p,T) 
yields the real time corresponding to processor p’s 
synchronized clock. Given a clock time T in the cur- 
rent frame (s.sync- period), da_rt returns the real-time 
at which processor p’s clock reads T. The current 
value of the cumulative correction is stored in the 
field cum.delta. 

Every frame the clock synchronization algorithm 
is executed, and an adjustment given by the Corr 
function of the clock synchronization theory is added 
to cum_delta. Figure 4 illustrates the relationship 
among c p , da_rt, and cum.delta. 

The specification of time-critical behavior in the 
DA model is accomplished using the da_rt function. 
For example, the broadcast-received function is ex- 
pressed in terms of da_rt: 

broadcast-received: 

function[DAstate, DAstate, processors — ► bool] = 
{Xs,t,q\ (Vp: 

(s.proc(p). healthy > 0 
A da_rt(s,p, s.proc(p).lclock) 

+ max-comm_delay 
< da_rt(i, q, tproc(g).lclock)) 

3 lproc(g).mailbox(p) = 

s.proc(p).mailbox(p))) 


Afda’ function[DAstate, DAstate, 
inputs — ► bool] = 

( A s, t, u : enough-hardware(t) 

At. phase = next_phase(s. phase) 

A (Vi: if s. phase = sync 
then Af5a(s, t, i) 
else t.proc(i). healthy = 
s.proc(i). healthy 
A t.proc(i).cum-delta = 
s.proc(i).cum_delta 
A <.sync_period = s.sync_period 
A (nonfaulty_clock(i, 

s.sync_period) 3 
clock_advanced(s.proc(i).lclock, 
t.proc(i).iclock, 
duration(s. phase)) 

A ( 5 . phase = compute 3 

<> «. *)) 

A ( 5 . phase = broadcast 3 

■V daOM,*)) 

A (s. phase = vote 3 

Via( S > *))) 

end if)) 

Figure 5: DA transition relation. 

Thus, the data in the incoming bin p on proces- 
sor q is defined to be equal to the value broad- 
cast by p, s.proc(p).mailbox(p), only when the real 
time on the receiving end, da_rt(t, g, t.proc(g).lclock), 
is greater than the real time at which the 
send was initiated, da_rt(s,p, s.proc(p).lclock), plus 
max-Comm_delay. This specification anticipates the 
design of a communications system that can deliver 
a message within max_.comm_delay units of time. 

In the DA level there is no single transition that 
covers the entire frame. There is only a phase-based 
state transition relation, A^ 0 , shown in figure 5. 
Note that the transition to a new state is only valid 
when enough-hardware holds in the next state: 

enough-hardware: 

function[DAstate — ► bool] = 

(At: maj_working(t) A 

enough-docks(t. sync_period)) 

The transition relation Mda is defined in terms of four 
subrelations (not shown): N% a and 

each of which applies to a particular phase type. 

As defined by the compute phase relation JVJ 0 , 
the proc_state field is updated with the results of 
task computation, / c (u, s.proc(f).proc_state), and the 
mailbox is loaded with the subset of these results to 
be broadcast. Note that each nonfaulty replicated 
processor is required to behave deterministically with 
respect to task computation; in particular, f c is the 
same computation function as specified in the US 
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layer. Moreover, the local clock time is changed in 
the new state. This is accomplished by the predicate 
clock_advanced, which is not based on a simple in- 
crementation operation because the number of clock 
cycles consumed by an instruction stream will ex- 
hibit a small amount of variation on real processors. 
The function clock-advanced accounts for this vari- 
ability, meaning the start of the next phase is not 
wholly determined by the start time of the current 
phase. 

clock-advanced: 

function[logical_clocktime, logicaLclocktime, 
number — ► bool] = 

( A X, Y; D : X + D * (1 - v) < Y A 
Y < X + D * (1 + u)) 


us Q 

i L 

F 

RS Q 

i i 

DSmap 

«Q— O— O--Q--Q 

i i a a u a 
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da O — — *o 
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v represents the maximum rate at which one pro- 
cessor’s execution time over a phase can vary from 
the nominal amount given by the duration function. 
v is intended to be a nonnegative fractional value, 
0 < v < 1. The nominal amount of time spent in 
each phase is specified by a function named duration: 

duration: functionfphases — ► logicaLclocktime] 

The predicate initiaLda puts forth the conditions 
for a valid initial state. The initial phase is set to 
compute and the initial sync period is set to zero. 
Each element of the DA state array has its healthy 
field equal to recovery-period and its proc.state field 
equal to initiaLproc.state. 

initialjda: function[DAstate — ► bool] = 

(As: s. phase = compute A 
s. sync_period = 0 A 

(Vi: s.proc(i). healthy = recovery-period A 
s.proc(*).proc_state — initial_proc_state A 
s.proc(*).cum_delta = 0A 
s.proc(i).lclock = 0 A 
nonfaulty-clock(i, 0))) 

By initializing the healthy fields to the constant 
recovery-period we are starting the system with all 
processors working . Note that the mailbox fields are 
not initialized; any mailbox values can appear in a 
valid initial DAstate. 

5 Summary of System Proof 

Figure 6 shows the complete state machine hierar- 
chy and the relationships of transitions within the 
aggregate model. By performing three layer- to-layer 
state machine implementation proofs, the states of 
DA, the lowest layer, are shown to correctly map to 
those of US, the highest layer. This means that any 
implementation satisfying the DA specification will 


Figure 6: RCP state machine and proof hier- 
archy. 

likewise satisfy US under our chosen interpretation, 
which is given by a functional composition: 

DAmap o DSmap o RSmap 

5.1 Overall Hierarchy 

The two theorems required to establish that RS im- 
plements US are the following. 

RS_frame_commutes: Theorem 
reachable(s) AAf r3 (s ) t ) u) 3 
A/^RSmap^), RSmap(<), u) 

RS_initial_maps: Theorem 

initial-rs(s) 3 initiaLus(RSmap(s)) 

The theorem RS_frame_com mutes shows that a suc- 
cessive pair of reachable RS states can be mapped by 
RSmap into a successive pair of US states (upper tier 
of figure 6 commutes). The theorem RSJnitiaLmaps 
shows that an initial RS state can be mapped into 
an initial US state. 

To establish that DS implements RS, the following 
formulas must be proved. 

DS_frame_com mutes: Theorem 

s . phase = compute A frame_N_ds(s,t, u) 3 
A/’ ra (DSmap(s), DSmap(t), u) 

DS_initial_maps: Theorem 

initial_ds(s) 3 initial_rs(DSmap(s)) 

Note that DS transitions have finer granularity than 
RS transitions: one per phase (four per frame). 

Therefore, to follow the proof paradigm, we must 
consider only DS states found at the beginning of 
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each frame, namely those whose phase is compute. 
frame.INLds is a predicate that composes four sequen- 
tial phase transitions using Nds * 

frame_N_ds: function[DSstate, DSstate, 
inputs — ► bool] = 

(\s i t i u:(3x 1 y,z: 

A fds{s,x,u) A Afds{x,y t u) A 
Afds(y,z,u) A A fda(z,t, «))) 

Using this device, we can show that the second tier 
of figure 6 commutes. 

Finally, to establish that DA implements DS, the 
following formulas must be proved: 

phase_com mutes: Theorem 

reachable(s) A Nda{s, t, u) 3 
A/d S (DAmap(s), DAmap(t), u) 

DAJnitial_maps: Theorem 

initialjda(s) 3 initiaLds(DAmap(s)) 

Since DA and DS transitions are both one per phase, 
the proof is completed by showing that each of the 
four lower cells of figure 6 commutes. 

5.2 DA Layer Proof 

We provide a brief sketch of the key parts of the DA 
to DS proof. First, note that the two specifications 
are very similar in structure. The primary differ- 
ence is that the DS specification lacks all features 
related to clock time and real time. A DSstate struc- 
ture is similar to a DAstate, lacking only the Iclock, 
cum.delta, and sync.period fields. Thus, in the DA 
to DS mapping function, these fields are not mapped 
(i.e., are abstracted away) and all of the other fields 
are mapped identically. 

Additionally, the DS transition relation is very 
similar to Nda : 

A/d a : function[DSstate, DSstate, 
inputs — ► bool] = 

( A s, t, u : maj.working(i) 

A t. phase = next.phase(s. phase) 

A (Vi: 

if 3. phase = sync 
then Afd 3 (s,t,i) 
else t. proc(i). healthy = 
s. proc(i). healthy 
A ( 3 . phase = compute 3 
Vd,(s, <,»,«)) 

A (s. phase = broadcast 3 

A (s. phase = vote 3 
A &(«,*,. •)) 

end if)) 


The phase_commutes theorem must be shown to 
hold for all four phases. Thus, the proof is decom- 
posed into four separate cases, each of which is han- 
dled by a lemma of the form: 

phase-corn-,*': Lemma 

3. phase = X A A /da(M> «) 3 
A/ds(DAmap(s), DAmap(t), u) 

where X is any one of {compute, broadcast, 
vote, sync}. The proof of this theorem requires 
the expansion of the Nda relation and show- 
ing that the resulting formula logically implies 
Nds ( DA m a p(s) , D A m a p(tf ) , u) . 

The proof of each lemma phase. comJ£ is facili- 
tated by using a common, general scheme for each 
phase that further decomposes the proof by means 
of four subordinate lemmas. The general form of 
these lemmas is as follows: 

Lemma 1: s. phase = X AA/iof^tit) 3 

Lemma 2: a. phase = X A A/"^ (s, <, i) D 
A/<fcs(DAmap(s), DAmap(l), i) 

Lemma 3: ss. phase = X A 

DS.maj-Working(tt) A 
(Vi : A/^(ss,tt,f» 3 
Nds(ss,tt t u) 

Lemma 4: s. phase = X ANda(s,t,u) 3 
DS.maj-Working(DAmap(t)) 

A few differences exist among the lemmas for the four 
phases, but they adhere to this scheme fairly closely. 
The phase-comJF lemma follows by chaining the four 
lemmas together: 

Nda(s, t , «) 3 ( V i : N* a {s, t , z)) 3 

(Vi : A/^(DAmap(s), DAmap(t), i)) 3 

A/ds(DAmap(s), DAmap(t), u) 

In three of the four cases above, proofs for the lem- 
mas are elementary. The proof of Lemma 1 follows 
directly from the definition of Nda* Lemma 3 follows 
directly from the definition of Nds • Lemma 4 follows 
from the definition of Nda, enough-hardware, and the 
basic mapping lemmas. 

Furthermore, for three of the four phases, the proof 
of Lemma 2 is straightforward. For all but the broad- 
cast phase, Lemma 2 follows from the definition of 
N* s , N* a , and the basic mapping lemmas. 

However, in the broadcast phase, Lemma 2 from the 
scheme above, which is named com_broadcast.2, is a 
much deeper theorem. The broadcast phase is where 
the effects of asynchrony are felt: we must show that 
interprocessor communications are properly received 
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in the presence of asynchronously operating proces- 
sors. Without clock synchronization we would be un- 
able to assert that broadcast data is received. Hence 
the need to invoke clock synchronization theory and 
its attendant reasoning over inequalities of time. 

The lemma com_broadcast_2 deals with the main 
difference between the DA level and the DS level — the 
timing constraint in the function broadcast_received. 
The timing constraint 

da_rt(s,p, s.proc(p).lclock) + max_comm_delay < 
da_rt(t, q , i.proc(g). (clock) 

must be satisfied to show that the DS level analog 
of broadcast-received holds. A key lemma relating 
real times on two processors is instrumental for this 
purpose: 

ELT: Lemma 

T 2 > Ti +bbA(Ti > T°) 

A (bb > r°) A r 2 € R (sp) A Ti € tf (sp) 

A nonfaulty-clock(p,sp) 

A nonfaulty-clock(g, sp) 

A enough_clocks(sp) 

3 r4 SP) (T 2 ) > 

r<f p) (Ti) + (l-|)*|bbl-* 

This lemma establishes an important property of 
timed events in the presence of a fault-tolerant clock 
synchronization algorithm. Suppose that on proces- 
sor q an event occurs at TI according to its own clock 
and another event occurs on processor p at time T2 
according to its own clock. Then, assuming that the 
clock times fall within the current frame and enough 
clocks are nonfaulty, then the following is true about 
the real times of the events: 

4 SP) (T 2 ) > rf< sp) (Ti) + (1 - f ) * |bb| - 6 

where bb = Ti — Ti, T\ = s.proc(p).lclock, and T 2 — 
f.proc(g). (clock. 

If we apply this lemma to the broadcast phase, let- 
ting TI be the time that the sender loads his outgo- 
ing mailbox bin and T2 be the earliest time that the 
receivers can read their mailboxes (i.e., at the start of 
the vote phase), we know that these events are sepa- 
rated in time by more than (1— |)*|bb|— 6. By choos- 
ing the value bb = duration(broadcast) in such a way 
that this real time quantity exceeds max_comm_delay, 
accounting for v variation as well, we can prove that 
all broadcast messages are properly received. 

5.3 Proof Mechanization 

All proofs sketched above as well as the other RCP 
proofs have been carried out with the assistance of 
Ehdm [Butler 1992]. Although the first phase of this 


work was accomplished without the use of an auto- 
mated theorem prover [Di Vito 1990], we found the 
use of Ehdm beneficial to this second phase of work 
for several reasons. 

• Increasingly detailed specifications emerge in 
the lower level models. 

• The strictness of the Ehdm language forced us 
to elaborate the design more carefully. 

• Most proofs are not very deep but contain sub- 
stantial detail. Without a mechanical proof 
checker, it would be far too easy to overlook a 
flaw in the proofs. 

• The proof support environment of Ehdm assures 
us that our proof chains are complete and we 
have not overlooked some unproved lemmas. 

• The decision procedures for linear arithmetic 
and propositional calculus relieved us of the 
need to reduce many formulas to primitive 
axioms of arithmetic. Especially useful was 
Ehdm’s reasoning ability for inequalities. 

6 Conclusion 

We have described a formalization of the synchroniz- 
ing aspects of a reliable computing platform (RCP). 
The top level specification is extremely general and 
should serve as a model for many fault-tolerant sys- 
tem designs. The successive refinements in the lower 
levels of abstraction introduce, first, processor repli- 
cation and voting, second, interprocess communica- 
tion by use of dedicated mailboxes, and finally, the 
asynchrony due to separate clocks in the system. 

Key features of the overall RCP work completed 
during Phase 2 and improvements over the results of 
Phase 1 include the following. 

• Specification of redundancy management and 
transient fault recovery are based on a very 
general model of fault-tolerant computing sim- 
ilar to one proposed by Rushby [Rushby 1991, 
Rushby 1992], but using a frame-based rather 
than task-based granularity of synchronization. 

• Specification of the asynchronous layer design 
uses modeling techniques based on a time- 
extended state machine approach. This method 
allows us to build on previous work that for- 
malized clock synchronization mechanisms and 
their properties. 

• Formulation of the RCP specifications is based 
on a straightforward fault model, providing a 
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clean interface to the realm of probabilistic reli- 
ability models. It is only necessary to determine 
the probability of having a majority of work- 
ing processors and a two-thirds majority of non- 
faulty clocks. 

• A four-layer tier of specifications has been com- 
pletely proved to the standards of rigor of the 
Ehdm mechanical proof system. The full set of 
proofs can be run on a Sun SPARCstation in 
less than one hour. 

♦ Important constraints on lower level design and 
implementation constructs have been identified 
and investigated. 

Based on the results obtained thus far, work will 
continue to a Phase 3 effort, which will concentrate 
on completing design formalizations and develop the 
techniques needed to produce verified implementa- 
tions of RCP architectures. 
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